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9P ' Abstract 

o 
o , 

r\i , This paper considers the average complexity of maximum hkehhood (ML) decoding of convolutional 

codes. ML decoding can be modeled as finding the most probable path taken through a Markov graph. 
Integrated with the Viterbi algorithm (VA) , complexity reduction methods such as the sphere decoder often 

fT^ ' use the sum log likelihood (SLL) of a Markov path as a bound to disprove the optimality of other Markov path 

sets and to consequently avoid exhaustive path search. In this paper, it is shown that SLL-based optimality 
tests are inefhcient if one fixes the coding memory and takes the codeword length to infinity. Alternatively, 



Y^ • optimality of a source symbol at a given time index can be testified using bounds derived from log likelihoods 

of the neighboring symbols. It is demonstrated that such neighboring log likelihood (NLL)-based optimality 

%!■ I tests, whose efficiency does not depend on the codeword length, can bring significant complexity reduction 

^ ! 

r^ , to ML decoding of convolutional codes. The results are generalized to ML sequence detection in a class of 

l> . 
^^ ' discrete-time hidden Markov systems. 

m _ 
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^^ I coding complexity, convolutional code, hidden Markov model, maximum likelihood decoding, Viterbi 



algorithm 

I. Introduction 

We study the algorithms that reduce the average complexity of maximum likelihood (ML) decoding 
of convolutional codes. By ML decoding, we mean the decoder uses code-search to find, and to 
guarantee the output of, the most likely codeword. 

Forney showed that ML decoding of convolutional codes is equivalent to finding the most probable 
path taken through a Markov graph [1]. Denote the codeword length by A^ and the coding memory 
by u. For each time index, the number of Markov states in the Markov graph is exponential in u. The 
total number of Markov states is therefore exponential in u but linear in A^. Define the complexity of 
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a decoder as the number of visited Markov states normalized by the codeword length A^. Practical ML 
decoding is often achieved using the Viterbi algorithm (VA) [2][1], whose complexity does not scale 
in A^ but scales exponentially in v. Well known decoders such as the list decoders [3], the sequential 
decoders [4], and the iterative decoders [5] are able to achieve near optimal error performance with 
low average complexity. However, these decoders do not guarantee the output of the ML codeword 
[6]. 

If obtaining the ML codeword is strictly enforced (see Section IVIII for justification), to avoid 
exhaustive path search, the decoder must develop certain criterion or bound that can be used to 
disprove the optimality of a Markov path set. This is equivalent to developing an optimality test 
criterion (OTC) [7] to test whether the ML path (or codeword) belongs to the complementary path 
set (or codeword set)ll|. 

Two major OTCs have been used in the ML decoding of convolutional codes. The first one is the 
"path covering criterion" (PCC) (explained in [8] and in Appendix |X]) used in the VA [2][1]. VA 
visits a// Markov states in chronological order [1]. For each time index, the decoder maintains a set of 
"cover" (defined in Appendix |X]) Markov paths each passing one of the Markov states [1] . According 
to the PCC, the "cover" Markov path passing a Markov state disproves the optimality of all other 
Markov paths passing the same state. The second OTC is the sum log likelihood (SLL)-based OTCs 
used extensively in the sphere decoder [10] [9]. Sphere decoder models ML decoding as finding the 
lattice point closest to the channel output in the signal space [9]. Hence the distance between the 
channel output and an arbitrary lattice point upper bounds the distance from the channel output 
to the ML codeword. Such distance bound is based on the SLL of the corresponding codeword, and 
is used in the sphere decoder [10] [9] as well as other ML decoders [7] as the key means to avoid 
exhaustive codeword search. In [11] [12], Vikalo and Hassibi showed that PCC-based and SLL-based 
optimality tests can be combined to find the ML codeword without visiting all Markov states. 

Assume PCC-based optimality test is always implemented. In this paper, we first show that 
additional complexity reduction brought by the SLL-based optimality test diminishes as one fixes 
the coding memory v and takes the codeword length A^ to infinity. Such inefficiency is due to the 
fact that SLL-based OTC does not exploit the structure of the convolutional code. Searching the 
ML codeword is equivalent to finding the ML source message, which contains a sequence of source 
symbols. We show whether the ML message contains a particular symbol at a given time index can 
be tested using an OTC that depends only on the log likelihood of channel output symbols in a fixed- 

^In the literature such as [7], OTC refers to a criterion designed to test whether a single codeword is optimum. In this paper, 
we extend the definition of OTC to a general criterion that can either verify or disprove the optimality of a codeword set. 



sized time neighborhood. We call such test the neighboring log likelihood (NLL)-based optimality 
test, and show its efficiency does not depend on the codeword length. We theoretically demonstrate 
that NLL-based optimality test can bring significant complexity reduction to ML decoding when 
the communication system has a high signal to noise ratio (SNR). Complexity of the decoder using 
SLL-base optimality test, on the other hand, remains the same as the VA for all SNR if the codeword 
length is taken to infinity. The results are also generalized to ML sequence detection in a class of 
discrete-time hidden Markov systems [13]. 

IL Problem Formulation 
Let C be an (n, k) convolutional code over GF(g) defined by a polynomial generater matrix G{D) 

[14], 

G{D) = G[0] + G[l]D + ... + G[u- ^D''-^ (1) 

where D is the delay operator; u is the coding memory; G[l], I = 0, . . . jU — 1, are k x n matrices 
over GF(q'). Assume G{D) is a minimal encoder [14]. 

Denote the source message by a sequence of vector symbols, 

x{D) = x[d]D'^ + x[d + l]D'^+^ + . . . , (2) 

where d is the time index, possibly negative; x[d], \fd, are row vectors of dimension k over GF(g). 
The encoded message, or the corresponding codeword, is given by 

yiD) = x{D)G{D) = Y. E x[d - l]G[l]D''. (3) 

d 1=0 

To simplify the presentation, we assume time index d takes all integer values. We assume x[d] =0 
ioT d < and d> N. We term N the codeword length. 

Define a function gq{y) that maps y from GF(g) to TZ (the set of real numbers) in one-to-one 
sense. If y{D) is a vector sequence, gg{y{D)) applies the mapping to each of the elements of y{D), 
respect iveljo. Assume the codeword is transmitted over a memoryless Gaussian channel. The channel 
output symbol sequence is given by 

riD) = qMD)) + n{D) = gg{x{D)G{D)) + n(D), (4) 

where n(D) = n[d\D'^ + n[d + 1]!^'^+^ -|- . . . is the noise sequence with n[d] ~ A^(0, cr^/) being 
i.i.d. Gaussian. Without loss of generality, we define the scaled signal to noise ratio of the system 
as SNR = ^. In Section IVIl we show that the results are generalizable not only to other channel 
models, but also to a class of hidden Markov systems. 

^Hence the output of gq(y{D)) is a vector sequence of the same length and dimension as y{D). 



Given the channel output, for any source message x[D) and its corresponding codeword y{D) = 
x{D)G{D), we define the "negative SLL" as 

S,ix{D)) = SyiviD)) = Y: \\r[d]-g,{y[d])f. (5) 

The objective of ML decoding is to find the ML message xml{D) that minimizes the negative SLL, 

Xml{D) = argmin S:,{x{D)). (6) 

x[d],0<d<N 

Throughout this paper, we assume PCC-based optimahty test is always implemented. For the sake 
of completeness, a description of PCC-based optimality test is given in Appendix \M 

III. Inefficiency of Sum Log Likelihood-based Optimality Test 

For ML decoders using SLL-based optimahty test, the decoder first obtains a quick guess of the 
source message without solving the ML decoding problem. SLL of the obtained message is then used to 
help disproving the optimality of certain Markov path sets and consequently to avoid exhaustive path 
search. We make an ideal assumption that the "guessed" message equals the transmitted messagq^- 
We show in this section that, even under this ideal assumption, complexity reduction brought by the 
SLL-based optimality tests still diminishes as we take A^ to infinity. 

Let x{D) be the actual source message, which is also the message "guessed" by the decoder. Let 
y{D) = x{D)G{D) be the transmitted codeword. The corresponding negative SLL is given by 

N+u-l N+u-1 

S.ixiD))= Y: \\r[d]-g,{y[d])f= ^ \\n[d]\\\ (7) 

d=0 d=0 

Now consider a subset of time indices D^ C [0, A^). Let {a;[(i]|(i G D^} be a partial message defined 
only at time indices in D^. Denote by {x{D^)} the set of messages satisfying 

{x{D'^,)} = {xo{D)\xo[d] = i[d],W e D'lxoiD) ^ x{D)}. (8) 

Suppose the decoder wants to test whether it can disprove the optimality of {a;(L'^)}, i.e., whether 
XMiiD) ^ {x{D2)}. A common practice [7] [11] [12] is to find a lower bound, denoted by S^{x{D^)), 
of the negative SLLs of the messages in {x{D^)}. 

S.{xo{D)) > S^{x{D^,)), \^xo{D) G {x{D^,)}. (9) 

If the lower bound S^{x{D^)) is larger than Sx{x{D)) obtained in ([7]), then we have Sx{xo{D)) > 
S^{x{D^)) > Sx{x{D)) for aU Xq(D) G {x(D^)}, which means the ML message is not in {x(D^)}. 

^Note that the decoder still needs to testify whether the guessed message is indeed the ML solution. If it is not, then a search 
for the ML message must be carried out. 



In Appendix [Bl we show that the SLL lower bounds appeared in the hterature satisfy the following 
assumption. 

Assumption 1: Given {x{D^)}, let D^ C [0, A^ + ly) be the maximum time index set, over which 
we can find a partial codeword y{D^) such that for all Xo{D) G {x{D^)} with VoiD) = Xq{D)G{D), 
we have yo[d] = y[d\ for all d G D^. Note that D^ and y{D^^) are uniquely determined by {x{D^)}. 
We also have |-D^| < \D^\ + i^- 

We assume the existence of a positive constant e G (0, 1], whose value does not depend on A^, such 
that 

S'A^{D^a))< E \\r[d\-g,{m)\? + {N + u-\Dl\){l-e)na\ (10) 

■ 

As demonstrated in [11][7], if we fix A^, using S^{x{D^)) > Sx{x{D)) as the OTC to disprove 
the optimality of message set {x{D^)} can bring significant complexity reduction to ML decoding, 
especially under high SNR. However, if we define Dg C D^ as the subset of time indices corresponding 
to the erroneous codeword symbols, i.e., 

D, = {d\deDlyid)^yid)}, (11) 

the following proposition shows that SLL-based optimality tests become inefficient if A^ — \D^\ is 
taken to infinity while \De\ is kept finite. 

Lemma 1: Assume the generater matrix G{D) is fixed, and therefore the constraint length u is 
fixed. Consider message sets characterized by {x{D2)} for arbitrary D^ but under the constraint of 
a fixed De, where De C D^ is defined in (1111) and the derivation of D^ is specified in Assumption [H 

If we fix SNR and take N — \D^\ to infinity, we have 

lim P{S^{x{D^,)) > Sx{x{D))} = 0. (12) 

N—\D-^j\^oo 

I a I 

If we first take A^ — \D^\ to infinity and then take SNR to infinity, we have 

Inn Imi P{S^{x{D^,)) > S.{x{D))} = 0. (13) 

I a I 

■ 

Proof: Since \D^\ < \D2\ + ly, taking A^ — \D2\ to infinity implies taking A^ — \D^\ to infinity. 
According to Assumption [1], we have 
S^{x{D^,))-Sx{x{D)) ^ 1 



^ .r^,._imi l E Md] - gAmW] + (i - < 

1 



N + u-m - N + u-\Dy, 



na 



, u. ^y \\n\d]f\ -^-^ y \\n\d]f. (14) 



Since n[d] are i.i.d. Gaussian with covariance matrix cr"^!, ||n[d]||^ are i.i.d. x^ with mean na"^ and 
variance 2na^ Therefore j^.J^^y, Ed^oy \\n[d]f -^ na^, ^,J_,^y, (Eden, \\r[d] - gq{y[d])f) -^ 0, 
and ^ I ^i|j)i;| J2d£De II^MIP ~^ with probabihty one as A^ — \D^\ -^ oo. Consequently, denote the 
right hand side of ( TT4l) by Uq, we have with probabihty one, 

hm Uo = -ena^ < 0. (15) 

N-\Dy\-.oc 

This yields 

hm P\S^ixiD^,))>S,ixiD))} = hm p | ^^ (^(^g)) - ^xO^P)) ^ A 

< lim P{Uo>0} = 0. (16) 

N-\Dy\^oo 

I a I 

Since 0161) holds for all SNR, the conclusion remains true if we take SNR to infinity after A^— \D^\ 



i 



is taken to infinitjiZ 

With the help of Lemma [1], inefficiency of SLL-based optimality tests is characterized by the 
following lemma. 

Lemma 2: Let Cgu be the complexity of an ML decoder that only uses PCC- and SLL-based 
optimality tests for complexity reduction. Let C^a be the complexity of the Viterbi decoder, in which, 
only PCC-based optimality test is used. For any 5 > 0, we have, 

lim P{Csu > (1 - 6)C,a} = 1 
lim hm P{Csu > (1 - 6)aa} = 1- (17) 

■ 

The proof of Lemma [2] is given in Appendix O 

IV. Neighboring Log Likelihood-based Optimality Test 

We propose in Theorem [1] a class of NLL-based optimality tests, whose efficiency does not depend on 
the codeword length A^. We show in Section |V] that these NLL-based optimality tests can significantly 
reduce the average complexity of ML decoding under high SNR. This is in contrast to the inefficiency 
of SLL-based optimality tests which are not able to bring meaningful complexity reduction if A^ is 
taken to infinity first. 

Theorem 1: Define rf^in, c^max by 

^min = min Wg^iy^) - g^jMf, c^Lx = max \\gg{yi) - gg{y2)f, (18) 

y 17^2/2 y 17^2/2 

*Note that the order in which limits are taken in (I13p is important. If we fix A*' and take SNR to infinity first, we can get 
lim^_|o;5Hoo hmsNR^oo P{S^(5;(DS)) > S^xiD))} = 1. 



where y^, ^3 ^^^ n-dimensional row vectors over GF{q). Let ^ be an arbitrary constant, M be an 
arbitrary integer, satisfying 

0<^<!lHH£, M>-=. (19) 

Let Xq{D) be a source message whose corresponding codeword is yQ^D). For any time index m, if 
the following inequality is satisfied for all d & [m — 2Mu, m + 2Miy), 

\\r[d]-g,{y,[dm<^-^, (20) 

and the following inequalities hold, 

m+(2M+l)i/-l 

E \\r[d] - g,iy,[d])f < M^ - udl,^ 

d=m+2Mi' 
m-2Mu-l 

E \\r[d]-g,iy,[d]W<MC-i^dl^., (21) 

d=m-(2A/+l)i/ 

then we must have ajof'^] = ^a/lI^]? Vm G [m, m + z^). ■ 

We skip the proof of Theorem [1] since the result is implied by Theorem [3] presented in Section IVII 
Note that the values of dmin and (imax only depend on the gqO function. Hence, as long as gq{) 

. Given M, the 



d? 

and u are given, the values of ^ and M can be fixed, e.g., f = -^f^ and M 



Aud'L. 
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mm 

optimality test presented in Theorem [1] testifies the optimality of {a;[m]|m G [m, m + u)} using the log 
likelihood of channel output symbols within a fixed-sized time interval [m— (2M + l)z/, m+ (2M+ 1);^). 
It is quite intuitive to see, efficiency of the test does not depend on the codeword length if all other 
parameters are fixed. 

Efficiency of the OTC proposed in Theorem [1] is characterized by the following lemma. 

Lemma 3: Assume ^ and M are chosen to satisfy (IT9|) . Let m be an arbitrary time index. Let 
7/o(-D) equal the transmitted codeword within time interval [m — (2M + l)z/, m + (2M + l)i'). Define 
OPTm as the event that ( I2T1) is satisfied and ( l20l) is satisfied for all d & [m — 2Mu,m + 2Mu). 

Fix all other parameters and take SNR to infinity, we have 

lim P{0PT^} = 1. (22) 

bNR-^00 

The same conclusion holds if we first take A^ to infinity, then take SNR to infinity. 

hm hm P{0PT^} = 1. (23) 

SNR^oo W— >oo 

Proof: If yo{D) equals the transmitted codeword within time interval [m — (2M + 1)//, -m + 
(2M + l)z/), for de[m- (2M + l)iy, m + (2M + l)z/), we have 

r[d]-gqiy,[d]) = n[d]. (24) 



Consequently, fl2^ and (1251) hold because ||n[(i]||^ are i.i.d. x^, whose mean, g^, and variance, ^^, 
converge to as SNR goes to infinity. ■ 

Lemma [3] implies, if there is a suboptimal decoder whose probability of symbol detection error (as 
opposed to sequence detection error) is low under high SNR, then NLL-based optimality tests can 
help transforming the suboptimal detector to an ML detector with only marginal increase in average 
decoding complexity. An example of such transformation is presented in the following section. 

V. A Three-step ML Decoding Framework 

The communication system given in Section [TTl follows a discrete-time hidden Markov model [13], 
where each Markov state at time index d corresponds to a possible combination of source symbols in 
time interval [d — u, d]. If a decoder obtains the ML codeword using the VA, all Markov states within 
time interval [u, N] have to be visited. Alternatively, if one can use a low complexity algorithm to 
disprove the optimality of most of the Markov states, then the VA can limit its search by visiting 
only a small subset of Markov states. 

Following this idea, the three-step ML decoding framework is given as follows. 

• Step 1: The decoder uses a suboptimal algorithm (denoted by $sub) to obtain a quick guess of 
the codeword y{D) and its corresponding source message x{D). 

• Step 2: An NLL-based optimality test (specified in Theorem [1]) is applied to each of the source 
symbols of x{D). The decoder maintains a source symbol set sequence X{D), with X[d] being 
the source symbol set of time index d. If x[d] = XMild] can be confirmed by the optimality test, 
we let X[d] = {x[d\}; otherwise, we let X[d] be the set of all possible source symbol vectors at 
time index d. 

• Step 3: The decoder uses a modified VA to search for the ML source message. The only difference 
between the modified VA and the conventional VA is that, the modified VA visits a Markov state 
only if all source symbols corresponding to the Markov state belong to the source symbol sets 
X[d] of the corresponding time indices. 

Implementing the modified VA is quite straightforward. Hence its further description is skipped. 
Comparing to the three-step decoding algorithm studied in [7], the key advantage of using an NLL- 
based optimality test is that the test can be applied to an individual source symbol rather than the 
whole source message. 

Theorem 2: Let Pej'^sub} be the probability of symbol detection error of $sub- Assume, while 
fixing all other parameters, 

hm Pe{$sub} = 0, lini hm Pe{$sub} = 0. (25) 



Let Cmva be the average number of Markov states per time unit visited by the modified VA in the 
third step of the ML decoder. For any 6 > 0, we have 

hm P{Cn,va <1 + 5} = 1, hm hm P{Cn,,a <1 + 5} = 1. (26) 

SNR^oo SNR^ooAf^oo 

Proof: Let x{D), y{D) be the actual source message and the transmitted codeword, respectively. 
Let x{D), y{D) be the source message and the codeword output by $sub- According to (1^ . for any 
time index m, we have 

hm p|^[^]=^[^]' Li. (27) 

SNR-.00 1^ Y^ ^ [^ _ 2(M - l)i/, m + (2M + l)v) J 

where M is the parameter of the NLL-based optimality test, specified in Theorem [H According to 
(127]) . Lemma El and Theorem [H for any m, if y[d] = y[d],yd e [m - 2(M - l)u, m + (2M + l)u), 
then the probability that the NLL-based optimality test can confirm x[d] = XML[d],\/d G [171,171 + u) 
converges to one as SNR -^ 00. Consequently, letting X[d] be the source symbol set maintained by 
the ML decoder in the second step, we have 

lim P{\X[d]\ = l,Wde[m,m + u)} = 1, 'im (28) 

SNR^oo 



Since the worst case complexity of the modified VA is bounded, (1281) implies, for any (5 > 0, 

limsNR^cxD P{Cmva < 1 + <^} = 1. 

Since all derivations hold if we first take A^ to infinity, we also have limsNR^oohmAr^oo -PjC'mva < 
I + 5} = I. ■ 

By sharing computations among optimality tests, it is easy to see that the complexity of the second 
step of the ML decoder is equivalent, in order, to visiting one Markov state per time unit. Therefore, 
if $sub satisfies (l25ll . as SNR -^ oo, the complexity of the three-step ML decoder converges to the 
complexity of $sub; which can be significantly lower than the complexity of the VA. Moreover, the 
three steps of the ML decoder can be implemented in a parallelized manner in the sense that each 
step can process some of the source symbols without waiting for the previous step to completely finish 
its work. An example of such parallelized implementation can be found in [15, The Simple MLSD 
Algorithm] . 

VL Maximum Likelihood Sequence Detection in A Class of Hidden Markov Systems 

In this section, we generalize the results of Section [IV] to ML sequence detection (MLSD) in a class 
of first order discrete-time hidden Markov systems [13]. We demonstrate in Appendix [D] that the 
communication system presented in Section [TTl satisfies the model and the key assumptions given in 
this section. 
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Let u{D) = ■u[d]D'^ + u[d + 1]D'^^^ + ... be a first order Markov sequence, where d is the time 
index, possibly negative; u[d] represents the Markov state (at time d), which is a /c,^-dimensional row 
vector defined over GF{q). We assume u[d] = for d < and d > N, with A^ being the sequence 
length. Define y[d] = y{u[d]) as the "processed state", which is a deterministic function of u[d]. y[d] 
is a n-dimensional row vector defined over GF{q). We term y{D) = yldjD"^ + y[d + 1]D'^~^^ + ... the 
processed state sequence. Let r{D) = r[d]D'^ + r[d + 1]D'^^^ + ... be the observation sequence, where 
r[d] is a rz- dimensional row vector with real- valued elements. 

Denote the state transition probability of the hidden Markov system by 

Pt{ui\u2) = P{u[d + 1] = ui\u[d] = U2}. (29) 

Define the transition probability ratio bound ptr by 

Ptr = mm — - — . — -. (30) 

ui,U2,Ptiui\u2)>0 ^t[U3\Ui) 
U3,U4,Pt{u3\u4) > 

We assume the Markov chain is ergodic and homogeneous. Therefore, there exists a positive integer 
ly, such that 

P{u[d + u]=Ui\u[d]=U2}7^0, Vwi,U2. (31) 

Denote the observation distribution function by 

Fo{r\y,) = P{r[d]<r\y[d]=y,}. (32) 

Let the corresponding probability density function (or probability mass function) be /o(r|y^). 

We also make the following two key assumptions. 

Assumption 2: We assume state processing y[d] = y{u[d]) does not compromise the observability 
of the Markov states in the sense that there exists a positive integer u satisfying the following property. 
Given two Markov state sequences u{D) and u{D). For any time index d, if u[d] 7^ u[d], then we 
can find a time index m G {d — i',d+ u), such that y{u[m]) 7^ y{u[m]). 

Note that we used the same constant u in fl3Tl) and in Assumption [2l This is valid because if fl3Tl) 
is satisfied for u = uq, then it is also satisfied for all z/ > z/q; similar property applies to Assumption 
[21 Consequently, if Assumption [2] holds, a common integer u satisfying both (13T]) and Assumption [2] 
can always be found. 

Assumption 3: Assume the existence of two functions: Li{r, y^) and Lu{r, y-^), both are functions 
of the channel output symbol r and the processed state yi- Assume Li{r, y{) and L„(r, y^^) have the 
following two properties. 



11 
First, the following inequalities hold for all r and y^. 

Li{r,y-^)< min [- log(/o(^l?/2)) + log(/o(^l?/i))] 

Lu{.r, y^) > max [- \og{fo{r\y^)) + \og{fo{r\y^))] . (33) 

Second, the complexity of evaluating Li{r,y{) and Lu{r,y^) is low in the sense that they do not 
require the search of any processed state other than y^^. ■ 

Note that validity of the results presented in this section does not depend on the second property 
imposed in Assumption [31 However, we still include the property in the assumption since the key 
motivation of posing Assumption [3] is to use the two functions Li{r,y^) and Lu{r,y^) as tools to 
avoid exhaustive Markov state search and hence to reduce the complexity of ML decoding. Also note 
that the right hand side of the second inequality in fl33|) is not a function of y^. However, the upper 
bound on the left hand side is a function of a processed state y^ since one often needs a "reference 
state" in order to upper bound the right hand side of (l33ll . Further explanation is given in Appendix 

m 

Given the observation sequence r{D), the negative SLL of a state sequence u{D) is obtained by 

N 

S.iuiD)) = -Y.\og{Ur[dMd])Pt{u[d]\u[d-l])). (34) 

d=0 

The objective of MLSD is to find the ML sequence that minimizes the negative SLL, 

Uml{.D) = argmin Suiu{D)). (35) 

u[d],0<d<N 

The following theorem gives a class of NLL-based optimality tests. 

Theorem 3: Assume the discrete-time Markov system satisfies Assumptions [2] and [3l 

Let p > be a positive constant. Given a Markov state sequence u{D) and the corresponding 

processed states y{D). Let ptr be defined by (1301) . For any time index m, if there is an integer M > 

such that for all d^[m- 2Mu, m + 2Mu) 

Li{r[d],y[d]) > 3u{p -\ogptr), (36) 

and 

m+{2M+l)i/-l 

Y, Lu{r,y[d])<3Miyp+{u + l)logptr 

d=m+2Mv 
m-2Mu-l 

Y, Lu{r,y[d]) <3Mup + iy\ogptr, (37) 

d=m-{2M+l)i' 

then u[m + u — 1] = UmlV'^ + v — 1] must be true. ■ 

The proof of Theorem [3] is given in Appendix [El Note that Theorem [31 implies Theorem [H if we set 
the parameters in Theorem [H at the corresponding values given in Appendix [Dl 
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For communication systems following a discrete-time hidden Markov model, foiflVi) often belongs 
to an ensemble of density (or probability) functions, with the actual realization determined by the 
SNR. In other words, we can write the observation density (or probability) foiflVi, SNR) as a function 
of the SNR. Assume the discrete-time Markov system satisfies Assumption [3l where both functions 
Li{r,yi) and Lu{r,yi) can be functions of the SNR. We make the following assumption. 

Assumption 4: Assume the observation density (or probability) foiflVi, SNR) is a function of the 
SNR. Assume the discrete-time Markov system satisfies Assumption [3l Let the actual state sequence 
and the processed state sequence be u{D) and y{D), respectively. Define two positive numbers d"^^^ 



and d^^ as follows 



dL 



•2 
min 



sup|7>0; Imi P{Liir[d\,y[d]) > jSNR} 



1 



<,, = inf (7 > 0; Ifm P{L^{r[d],y[d]) < 7SNR} = l| . (38) 

We assume 

dmin > 0, c?Lx < 00 • (39) 

The following lemma characterizes the efficiency of the OTC proposed in Theorem [31 
Lemma 4: Assume the discrete-time Markov system satisfies Assumptions |2] and HI Let the state 
sequence be u{D). Let ^ be an arbitrary constant, M be an arbitrary integer, satisfying 

< C < ^^, M > ^^. (40) 

Let p = g^ . Given an arbitrary time index m, define OPTm as the event that (137]) is satisfied and 
( 136|) is satisfied for all d & [m — 2Miy, m + 2Mv). If we fix all other parameters except the SNR, we 
have 

lim P{OPT^} = L (41) 

bNR^oo 

If we fix all other parameters except the SNR and the sequence length A^, we have 

lim hm P{OPT„} = 1. (42) 

bNR— >oo N^co 

We skip the proof of Lemma |4] since it is quite straightforward. 

Note that in Lemma HJ when we take N and SNR to infinity, M can be fixed at a constant. This 
indicates that, when testing the optimality of a Markov state at a given time index, the NLL-based 
optimality test only uses observation symbols in a fixed-sized time neighborhood. Based on Theorem 
[3] and Lemma [3], a three-step ML sequence detector similar to the one presented in Section |V] can be 
developed to transform a suboptimal sequence detector to a low complexity ML sequence detector. 
The detailed discussion is skipped since it does not essentially differ from the one presented in Section 

El 
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VII. Further Discussions 

In a practical system, suboptimal decoders such as the behef-propagation-based iterative decoders 
[5] [6] can achieve near optimal error performance with low complexity. It is natural to ask: if subop- 
timal decoding only causes a negligible performance loss, why one should even bother with enforcing 
the ML solution? Note that this question does not suggest a default answer since the argument can 
also be presented in the opposite direction, i.e., if ML decoding only causes a negligible complexity 
increase, why one should not use an ML decoder? Nevertheless, the purpose of our work is not to 
participate in the debate whether ML decoding is practically useful. Rather, one should interpret 
Theorem [2] as, for convolutional codes, the existence of a well-performed low complexity suboptimal 
algorithm implies that ML decoding can be carried out with a similar complexity under high SNR. 
More importantly, such conclusion holds irrespective of the codeword length. 

Although the efficiency of SLL-based optimality tests does not depend on the codeword length, 
NLL-based optimality tests are inefficient only when the codeword length is large. Lemma [1] and 
Theorem [2] suggest that complexity reduction brought by NLL-based optimality tests can be superior 
to SLL-based optimality tests even for moderate SNR if the codeword length is large enough. 



Appendix 
A. The Path Covering Criterion 
Assume the discrete-time hidden Markov model given in Section 



lYI. 



Given the observation se- 



quence r{D). Let u{D) and u{D) be two Markov state sequences whose corresponding processed 
state sequences are y{D) and y{D), respectively. If we can find two time indices di < d2, such that 
u[di] = u[di], u[d2] = u[d2], and 

f Ur[d]\y[d-l])P,iu[d]\u[d-l]) 

.ir+i ^foir[d]\y[d-l]))P,{u[d]\u[d-l]) ' ^ ' 

we say u{D) "covers" u{D). 

Path Covering Criterion: Markov state sequence u{D) cannot be the ML sequence if we can 
find another state sequence u{D) that covers u{D). 

The proof of the PCC is skipped since it is quite well known [8]. 

We say u{D) is a "cover" path with respect to Markov states u[di] and u[d2] at time indices 
di < d2 if, among all Markov paths passing u[di] and ^[^2], u{D) maximizes Z]d=di+i ^'^s{fo{i"[d]\y[d— 
1]) Pt{u[d]\u[d — 1])). Assume all Markov paths start from it[— 1] = 0. We say u{D) is a "cover" path 
with respect to Markov state u[di] at time index di > if, among all Markov paths passing 'u.[(ii], 
u{D) maximizes Eti log(/o(r[d]|y[d - l])Pi(n[rf]|w[d - 1])). 

^It is shown in Appendix [D] that the model is satisfied by the communication system given in Section HI] 
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B. Examples of SLL-based Optimality Tests Satisfying AssumptionUl 

In [12] [11], when the decoder branches a Markov path at time index m < N, the branch is 
characterized by a partial message {x[0],x[l], . . . ,x[m]}. For any codeword y{D) associated to the 
branch, we have 

y[d] = j:^[d-mi]- (44) 



1=0 

In other words, D^ = [0,m]. The negative SLL lower bound is given by 

|2 



N+1/-1 -^ /"-I ^ 2 



r[d]-gAY.x[d-l]G[l] 

\l=0 J 



(45) 



Uv^m= E \\r{d\-9,[.m)f>Y. 

d=0 d=0 

which satisfies Assumption [T] with e = 1. 

In [7], several SLL-based OTCs were presented for decoding block codes. The decoder obtains a 
first guess y{D) of the codeword. A negative SLL lower bound Sy < Sy{y{D) ^ y{D)) is then 
developed for the codeword set {y{D) ^ y{D)}, which corresponds to the case of D^ being an empty 
set in the context of Section UTTl y{D) is optimal if the optimality test Sy > Sy{y{D)) gives a positive 
answer [7]. 

The lower bounds Sy presented in [7, Section III] satisfy the following inequality, 

S^y< min Y. \\g,{m) - 9,{y\d\)f (46) 

Since the coding constraint is u, we can always find a codeword y{D) ^ y{D) with y{D) differing 
from y{D) at no more than ly codeword symbols. This implies that the right hand side of fH6|) can 
be upper bounded by a constant, denoted by Ui, which is not a function of A^. 

^i^-<^}''m^ ^ WgAm) - 9Ay[d])f < U^ (47) 

Consequently, given SNR > and < e < 1, there exists a constant A^o such that Assumption [1] is 
satisfied for N > Nq. 

C. Proof of Lemma {E 

Proof: Assume, in searching the ML codeword, the decoder successfully avoided visiting a Markov 
state specified by {a:;o[<i — z^ + 1], . . . ,a;o[<i]}. This implies that we can find two time index sets, 
Dq C [d — u + l,d] and D^, D^ H [d — u + l,d] = 4>, such that the optimality of all message sets 
{x{Dq U D^)} with x[d] = XQ[d], 'id G D^ is disproved. We choose D^ with the maximum cardinality 
while make sure that, in disproving the optimality of {a:;o[rf — z^ + 1], . . . , a:;o[<i]}, the detector visited 
all the Markov states {x[d — z/ + 1], . . . , x[d\] satisfying [d — v + l^d\ C D^. 

According to the definitions of Dq and D^, the decoder needs to disprove the optimality of a 
special message set {xq{Dq U D^)} defined by a;o[<^] = x^ld], Vrf G Dq and XQ[d] = x[(I\, Vrf G D^. 
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The definition of D^ also implies that the decoder needs to obtain a lower bound S^{x{Dq U D^)) 
of the negative SLLs of the messages in {x{Dq U D^)}. The lower bound S^{x{Dq U D^)) should 
only be a function of the partial message x{Dq U D^), but should not depend on any source message 
symbol whose time index is outside DqUD^. However, since the corresponding Dg (defined in ( TTTi) ) of 
{xq{DqUD^)} satisfies {Del < 2p, according to Lemmad], the probability of disproving the optimality 
of {xq{DI U Df)} (using SLL-based optimality test) is low if A^ - |Dg U D^l > 2p. 

To make the argument explicit, the fact that the decoder visits all Markov states {x[d — v + 
1], . . . , x[d]} with [d — V + l,d] ^ D^ implies 

Csii > ^^q^^a. (48) 

According to Lemma [H for any positive constant 5 > 0, if we fix all other parameters and take A^ to 

infinity, we havq^ 

f /v — I n^l — I n^l A 1 
lim F r^ l^'^' I^oI ^Aat =1. (49) 



Combining (HSl) and fll^ . we get 



lim P{Csa>il-S)C,a} = l. (50) 

N—i-oo 



Since (15 up holds for any fixed SNR, it still holds if we take SNR to infinity after taking A^ to 
infinity, i.e., 

hm hm P {Csu > (1 - 5)a J = 1. (51) 



D. The Hidden Markov Model and Its Key Assumptions 

In this section, we show the communication system presented in Section [TTl satisfies the discrete-time 
hidden Markov model and the key assumptions given in Section I VII 

Consider a communication system modeled in Section [ITl Define u[d] = [x[d — z/ + 1], . . . , x[d]]. It 
is easy to see u{D) is a Markov sequence. The processed state y[d] = y{u[d]) is only a function of 
the corresponding Markov state. If two Markov states in successive time indices take the form 

u[d] = [x[d — u + I], . . . , x[d]] 

u[d + 1] = [x[d -u + 2],...,x[d+l]], (52) 

for some x[D), then we have 



Ptiu[d+l]\u[d]) = ^. (53) 



^An equivalent statement of (|49p is, if ^ — < ^N, as N ^ cxa, the probability of disproving the optimality of all 

message sets {x{Dq U D^)} with x[d] — xo[d], Vd G Dq, using SLL-based optimality test goes to zero. 
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Otherwise Pt{u[d + l]\u[d\) = 0. According to fl5U]) . we have ptr = 1- 

Since u[d] = [x[d — u + 1], . . . , x[d]] does not depend on source symbols at time indices m < d — u, 
we know 

Pt{u[d]\u[d-u])^0, yu[d],u[d-u]. (54) 

The observation density is given by 



fo{r\y) 



/SNR 



exp 



SNR, 



Uy)\\' 



(55) 



V 27r / "V 2 " 

Next, we show Assumption [2] is satisfied. Let u{D) and u{D) be two Markov state sequences. Let 
x{D) and y{D) be the source message and the codeword corresponding to u{D). Let x{D) and y{D) 
be the source message and the codeword corresponding to u{D). For a time index d, if u[d] ^ u[d], we 
can find a time index m E {d — u,d] such that a;[m] 7^ x[??2]. Consequently, according to [14, Corollary 
2], we can find a time index m e [m, m + z/), such that y[m] 7^ j/[m]. Therefore, Assumption [2] holds 
because fh E {d — i/,d + u). 

Let (i^jn and d'^^^ be defined in Theorem [H Let y^ 7^ ^2 be two arbitrary codeword symbols. We 
have the following triangle inequalities, 



\\r-gqiy2)\\ > \\9qiy2) - QqiVl] 

\\r-gg{y2)\\ < \\9q{y2) - dgiVl] 
The first inequality in ( l56l) implies 



min [- log(/o(r|y2))] + log(/o(r'|yi)) 



mm 
y2.y2^yi 



SNR 



SNR 



\\r - 9,{y2)f - \\r - 9,{y,W) 
-2\\r-g,iy,)\\). 



(56) 



(57) 



The second inequality in fl56|) implies 

max [-log(/o(r|?/2)) + log(/o(r|?/3))] 
y2^y3 



= max 
< max 

^2 



SNR, 



'^-^{\\r-g,{y2)r-\\r^g,{y,)r) 



SNR, 



^,(2/2)11' 



< max 
y2 



SNR(||r-^,(t/i 
<SNR(l|r-^,(yi)f + rf: 

Therefore, Assumption [3] is satisfied by defining 



\\9q{y2) 



{vi: 



'' ) 

max/ 



(5^ 



SNR 



Li{r, yi) = — (imin(rfmm - 2||r - 6^9(2/1) 



"max/ 



(59) 



L„(r,?/i) = SNR(||r-^,(yi)||2 + rf^ 

Note that evaluating Li{r, y-^) and Lu{r, yi) does not involve visiting any processed state other than 
Vi- 
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If y\d\ and r[d\ are the actual codeword symbol and the channel output at time index d, ||r[rf] — 
(7g(y[(i])|| = ||«'[rf]|| is a x^ random variable with mean g^ and variance g^^r- From (jSHD, it is easily 
seen that Assumption |4] is satisfied with d^j^ > and c/^ax < ^^• 

E. Proof of Theorem 

Proof: Let u{D) be an arbitrary Markov state sequence with corresponding processed state 
sequence being y{D). Assume 

u[m + ly — 1] ^ u[m + ly — 1] (60) 

Theorem [3] holds if we can prove that any u{D) satisfying fl60p cannot be the ML state sequence. 
Let k denote a positive integer. Define two integers Ki and Kj. as follows. 

Ki = argmin{tt[m + v — 1 — ku] = u[m + u — 1 — ku]} 

k>Q 

Kr = argmin{'a[r?T, + ly — 1 + ku] = u[m + z/ — 1 + kiy]}. (61) 

A:>0 

We consider respectively the following four cases based on the values of Ki and Kr- In all the four 
cases, we show u{D) cannot be the ML sequence. 

Case 1: Ki < 2M + 1, Kr < 2M - 1. 

Since u[m + u — l + ku] ^ u[m + i/ — l + ku] for all ~Ki < k < Kr, according to Assumption [21 y{D) 
and y{D) differ at no less than '^ '' time indices in the time interval [m + u — Kiu, m + v + Kru), 
where \_x\ denotes the maximum integer no larger than x. According to (l33l) and (l36l) . for d G 
[m — 2Mu, m + 2Mu), if y[d] ^ y[d], we have 



-log 11^1*1^ - ^'(^[^]'^[^]) > Mp-^ogp,r). (62) 



Consequently, we get 



^m+.^K.. f^(r[d]\y[d])Pt{n[d]\u[d-l]) 
,=mt^^K,u''^foir[d]\y[d])Pt{u[d]\u[d-l]) 



> 



Ki + K, 



3z/(p - \0gptr) + (Kr + Ki)u\0gptr > 



Kl + Kr 



3up > (63) 

According to the PCC presented in Appendix [K\ fl63|) implies that u{D) "covers"|j u(D). Hence 
u{D) cannot be the ML sequence. 

Case 2: Ki < 2M + 1, Kr > 2M - 1. 

In this case, we will construct a Markov sequence Uc{D) and show that Uc{D) covers u{D). 

^See definition in Appendix 1X1 
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Uc{D) is constructed as follows. 

Uc[d] = u[d], foT d <m + 2Mv 

u^[d] = u[d], ioT d>m+ (2M + l)z/. (64) 

According to (I3T]) . we can always construct Uc[d] for d G [m + 2Mz/, m + (2M + l)iy) so that flM|) is 
satisfied. Let ydD) be the processed state sequence corresponding to Uc{D). 
From (!33|) and the first inequality in (!37|) . we get 

m+{2M+l)v-l 

> - ^ L„(r[d], 2/[d]) + (z/ + 1) logpi, > -3Mz/p (65) 

d=m+2Mv 

Since u[m + h' — l + ku] ^ Uc[m + iy — l-\-kiy] for all — iT; < A; < 2M— 1, according to Assumption [21 

j/(-D) and t/(-D) differ at no less than '"'"^^ ""'' time indices in the time interval [111 + u — Kiu, m + 

2Mv). According to (l33l) and (l36ll . we have 

_ "'^'^''-' , Ur[d]\y[d])Pt{n[d]Hd-l]) 
a=mi:-K,. °^ /,(r[d]|yJd])P,(n,[d]|t.,[d - f]) 



> 



K, + 2M - 1 



3z/(p - log Pi,) + {Ki + 2M- l)ulogptr 



2 
> 3Mu{p - \og ptr) + 2Mv \ogptr > 3Mup (66) 

Combining fl65|) and fl66l) . we obtain 

_'"^^V^'^'^, /o(rM]|^M)P,(i.M]|i.[rf-f]) 

.=„.tr-i.,. ^fo{r[d]\yMPt{uc[d]K[d-l]) ^ > 

( 1671) implies that Uc{D) covers u{D). Hence according to the PCC, u{D) cannot be the ML sequence. 
Case 3: Ki > 2M + 1, Kr < 2M - 1. 

Similar to Case 2, we will construct a Markov sequence Uc{D) and show that Uc{D) covers u{D). 
Uc{D) is constructed as follows. 

Uc[d] = u[d], ioi d > m — 2Mz/ 

Uc[d] = u[d], ioT d < m - {2M + l)u. (68) 

According to (!3T|) . we can always construct Uc[d] for d & [m — (2M + l)z/, m — 2Mi^) so that (1681) is 

satisfied. Let y^D) be the processed state sequence corresponding to Uc{D). 

From (!33|) and the second inequality in (l37j) . we get 

-^J^'^-i /,(rM|^[rf])Pi(t.[d]|n[d-l]) 

.=„.-iXf+i). ""^ /o(rM]|2/Jd])Pt(i.eM]|ix,[c/ - 1]) 

m-2Mu-l 

>- J2 Lu{r[d],y[d]) + ulogptr>-3Miyp. (69) 

d=m-{2M+l)u 
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Since iilm + u —1 + ku] ^ Uc[m -\- ly — 1 -\- ku] for all —2M — l<k< K.^., according to Assumption 
[21 y{D) and y{D) differ at no less than '^'^+^+^r ^jj^ie indices in the time interval [m — 2Mv, m + 
V + K.f.v). According to fl33|) and fl36|) . we have 



d=m-2Mu 



foir[d]\y,[d])PtiMd]\uc[d-l]) 



> 



2M + 1 + Kr 



3u{p - \ogptr) + (2M + 1 + Kr)u\ogptr > 3(M + l)up. 



Combining (!69|) and (ITOl) . we obtain 



"^^'^e""' log „^r^^[^J'^M^fi^^^.'^!l^^'^7^Jl < 

d=m-{2A/+l)!^ 



(70) 



(71) 



Ur[d]\yAd])Pt{uMK[d-l]) 

1]) implies that Uc{D) covers u{D). Hence according to the PCC, u{D) cannot be the ML sequence. 

Case 4: Ki > 2M + 1, Kr > 2M - 1. 

We construct a Markov state sequence Uc{D) as follows. 

Uc[d] = u[d], ioi m-2Mv <d <m + 2Mv 

u^[d] = u[d], ioT d>m+ (2M + l)u 

uAd] = u[d], ioT d < m - {2M + l)iy. (72) 

Let the processed state sequence corresponding to Uc{D) be y^D). 

Since u[m + v — l + kv\ ^ Uc[m + v — l + kv\ for all — 2Af — 1 < k < 2M — 1, according to Assumption 



[21 y{D) and y{D) differ at no less than 
According to (133|) and (!36|l . we have 



4M+1 



time indices in the time interval [m — 2Miy, m + 2Mz/). 



m+2Mi/-l 

,=]r'2Mu^''^'foir[d\\yM)PtiMd]\uAd-l]) 



Ur[d]\y[d])Ptii^{d]Hd-l]) 



> 



4M+1 



3iy{p — \ogptr) + AMulogptr > QMvp. 

(73) 



Meanwhile, it is easily seen that (J651) and (J691) hold. Combine ([65ll . ([69|l and (173!) . we obtain 

/.(r[d]|y[rf]))P,(t^[d]|ii[d-l]) 



m+(2A/+l)i' 
\=rrSM+l)y^ foW]\yAd])Pt{uMMd-l]) 



> -3Mup - 3Mz/p + GMi^p = 0. (74) 



(JTlj) implies that Uc{D) covers u{D). Hence according to the PCC, u{D) cannot be the ML sequence. 

Overall, we showed that u{D) cannot be the ML sequence irrespective of the values of Ki and Kr- 

Therefore, u[m + z/ — 1] = ulm + u — 1] must be true. ■ 
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